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Different Estimation Procedures 
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The Rascn Model and the Marginal Rasch Model 

Over the past decades, the Rasch (1960) model has become 
increasingly popular and proved to be very useful in the 
theory and analysis of mental tests. The main reasons for the 
popularity of the Rasch model are the simplicity of the 
model, as compared with other item response models, and the 
existence of attractive statistical procedures for estimating 
its parameters. Therefore, we shall shortly review the 
different estimation procedures that have previously been 
used with the Rasch model in the second section. These 
procedures include maximum likelihood, Bayesian. minimum chi- 
square and pairwise comparison estimation. The third section 
consists of a comparison of the marginal maximum likelihood 
estimation and all other estimation procedures. 

Estimation Procedures in the Rasch Model 

We start with a test battery consisting of k 
dichotomously scored items, for which we assume that they all 
measure the same unidimensional (latent) trait or ability. 

Under the Rasch model, the probability that examinee v 
with ability 6 answers item i correctly is given by: 

(1) P(X=l|e. €i ) = U L /lU9*i) , 
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where is the item easiness parameter. 

On the usual assumption that the items are local 
(conditional) independent, the probability that examinee v 
has the response pattern x^xj . . . .x r ) is given by 

k x« 

(2) p(i=x|e.€) = » e €i /d+eci) . 



where x i= l if item i is answered correctly and x^O 

otherwise, and €=(€j € K ). 

If independence applies at person level, i.e.. if all 
examinees answer the items independently of each other, the 
joint probability of the response patterns for all N 
examinees can be written down as: 



N k 



*vi 



(3) puj^x! x N |e.o — ir t" e v €i" vl /(ue v€l ) . 

v=l i=l 1 



where 9 V is the ability of examinee v. e^Oj e N ) and 

x vi =l if person v answers item i correctly and 0 otherwise. 

At this place it is important to mention that it is not 
necessary that all examinees are administered the same set of 
items. It may happen that examinee one answers item 2 and ». , 
whilst examinee two has the items 2. 3 and 5 to solve. If 
this is the case, one speaks of an incomplete design. If the 
mechanism by which the items are administered is ignorable 
with respect to likelihood inference (Rubin. 1976). as is the 
case if. for example, items are administered randomly to 
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persons, all estimation procedures that are treated Au the 
following sections will be applicable for this special case 
with the necessary adjustments. The main adjustment concerns 
the introduction of a random matrix D. with d^l if person i 
has been administered item J. and d i;J =0 otherwise. According 
to the value of d A j. the likelihood function (3) is changed 
appropriately. To this point, it is not clear yet if adaptive 
and customized testing do influence the likelihood. Although 
the estimation procedures are identical for both designs, it 
is important to tress that the error of estimation in the 
Item parameters can be larger in the incomplete design, since 
fewer examinees answer to any particular item. Since 
incomplete designs are for the rest very comparable with 
complete designs, we will confine ourselves to complete 
designs . 

Note that the Rasch model is unidentifiable; if the item 
parameters are all multiplied with a constant c. and if all 
person parameters are divided by that same number, the 
probability statement in (3) does not change. For this 
reason . a constraint has to be imposed on the set of 
parameters. The choice of the appropriate constraint depends 
on the particular problem at hand and will therefore be 
imposed on the place needed. 

In this paper it is assumed that both sets of 
parameters, i.e.. item and person parameters, are unknown, 
and that they all have to be estimated from the data. For 
this purpose, the following estimation procedures are 
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available: maximum likelihood (unconditional or joint, 
conditional and marginal). Bayesian (hierarchical and 
marginal hierarchical), pairwise comparison and minimum chi- 
square estimation 

Since maximum likelihood and Bayesian estimation 
procedures have a general nature and can be applied in many 
different settings, these two estimation procedures will be 
discussed for the general case in more detail now. 

To this end. assume that the random variables X n . 

are independent and identically distributed with density 
f(x|a), where a is unknown and possibly vector valued. 
Maximum likelihood estimation is based on the principle that 
one should pick that value of a that makes the observed data 
most probable. To achieve this, the likelihood function is 
maximized with respect to the unknown parameter a. If the 
likelihood function is sufficiently smooth, as will often be 
the case, this can be done by differentiating the likelihood 
function, or equivalently. the loglikelihood. with respect to 
a. equating the derivative to zero, and finally solving the 
equation(s). Under the (mild) assumption that the density 
function f(x|a) satisfies certain regularity conditions, 
maximum likelihood estimates have very nice large sample 
features. First* maximum likelihood estimates are consistent, 
i.e., the estimates converge to the true parameter. Secondly, 
the maximum likelihood estimator of a is asymptotically 
normally distributed with mean a and with a variance- 
covariance matrix equal to the reciprocal of the Fischer 
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information matrix. Thirdly, the maximum likelihood estimator 
is efficient, i.e.. the data is used in an optimal way. 
Furthermore, if the density function f belongs to an 
exponential family. maximum likelihood estimators are 
asymptotically equivalent to uniform minimum variance 
unbiased estimators (DMVU). 

In Bayesian estimation, it is additionally assumed that 
the parameter o itself is random. Note that Bayesian 
techniques can also be interpreted from a frequentist point 
of view (Box & Tiao. 1973). The distributicr. of the parameter 
a. then, expresses the belief of the researcher in the 
possible values of a. This distribution of a is choosen prior 
to the observation of the data, and is therefore termed *a 
priori* distribution. After having observed the data, one can 
compute, with the help of Bayes rule, the posterior 
distribution. This posterior distribution is proportional to 
the product of the prior distribution and the likelihood 
function, and incorporates all information that is available 
for the unknown parameter a. Assuming that the prior 
distribution is characterized by a parameter n. the objective 
is now to estimate this parameter SI. In order to estimate 
this unknown parameter n. one now could us 9 a maximum 
likelihood approach, i.e.. use that estimator for n that 
maximizes the posterior distribution, or equivalently. the 
mode of the posterior distribution. Since the posterior 
distribution is a distribution function however, one could 
also use the median or the mean of the posterior to estimate 
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the unknown parameter Si. It depends on the nature of the 
problem which of these methods should be applied, although it 
seems that modal Bayes estimation seems to do the best Job. 
in most cases (O'Hagan. 1976). 

It is important to note that maximum likelihood and 
Bayes estimation are equivalent if the sample size is large, 
since in that case the prior information as used in the 
Bayesian estimation plays an insignificant role. These two 
procedures are also equivalent if the prior distribution is 
non-informative with respect to the unknown parameter, i.e 
in the case of a flat prior (Lehmann. 1983). 



Joint Maximumum Likelihood (JML) 

In this method, also called unconditional maximum likelihood, 
both sets of parameters are estimated simultaneously. This is 
done by maximising the joint likelihood function (3) over all 
the parameter/* . An estimate can be found by differentiating 
(3) with respect to that parameter, equate this derivative to 
zero and solve the resulting equation. For the Rasch model 
the resulting set of equations is given by: 

k k 

i-l* Vi ■ i ?, e * e i /(1+e * e i > £or 811 v=1 M 

(4) 

N N 

E x vi = E Mi/O+Mi) for all i=l Jc 

v=l v=l 



n 
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Note that (4) consists of implicit equations, and hence that 
iterative procedures to solve (4) have to be used. 
Furthermore, the equations in <4) have the well known form: 
"observed" = "expected". 

Since the same set of items may be administered to 
different populations, the constraint that is most appealing 

in this setting is one on the item parameters. Two con- 
ic 

straints have been used:^*^*! and ejal. The latter con- 
straint has the disadvantage that for item parameter 
estimates on the samethe standard errors for the estimates 
will be larger than in the first constraint (de Gruijter. 
personal communication). Therefore, the constraint * € 1= l 
will be used. 1=1 

A serious problem with joint maximum likelihood 
estimation is the fact that the item parameters ere not 
estimated consistently This is due to the fact that we have 
a problem with structural and incidental parameters (Neyman 
and Scoct. 1948). These problems ere the result of the feet 
that with the introduction of another examinee, we also 
introduce a new person parameter. This has the effect that 
the number of parameters increases indefinitely, so thet 
standard maximum likelihood estimation does not apply in this 
case. 1 heuristic interpretation for this phenomenon is the 
following: although with each new person we get edditional 
information about the item parameters, we also introduce bias 
since the person parameter is not known. Solutions to the 
general problem have been given by liefer and Wolfowitz 
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(1956). Lehmann (1959), Rasch (1960) and Andersen (1970. 
1973). 

Kiefer *nd Wolfowitz (1956) showed that one can 
consistently estimate the structural paramr r, if one 
assumes that the incidental parameters are inco^endent and 
identically distributed. Furthermore, this distribution can 
be also estimated consistently, Engelen (1987) used this for 
the special case of the Rasch model, and called this 
semlparameiric estimation. This will be discussed more 
extensively in the section marginal maximum likelihood. 

Using Lehmanns (1959) notion of conditional estimation 
Andersen (1970, 1973), proved that a soluti' for the problem 
of structural and incidental parameters can be given, if 
there exists 'sufficient' statistics for the incidental 
parameters that do not depend on the structural parameters. 
Note that this was Jfchft most important assumption that led to 
the Rasch model (Rasch, 1960). This solution has been termed 
'conditional maximum likelihood estimation' by Andersen and 
will be discussed in greater detail below. 

The most famous example of structural and incidental 
param ters has been given by Neyman and Scott (1948). They 
considered a sequence of independent normally distributed 

random variables Xjj, i=l,...n, j=l k such that 

111 Xjjf have mean p A and variance a 2 . They showed that 

the (inconsistent) maximum likelihood estimate of <r 2 , can be 
adjusted with a factor (k-l)/k to yield a consistent 
estimate. It was long believed that this factor could also 
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been used in the Rasch model, i.e.. if the estimates of e L 
where multiplied by (k-l)/k that consistent estimate would be 
the result (Wright a Douglas. 1977; Andersen. 1980). However, 
the proof for this fact has never been given since Andersen's 
proof only applies to the special case k=2; a generalization 
of this proof for larger k has never been given up to now. In 
a simulation study by van den Wollenberg (1986) it was shown 
that the factor (k-l)/k does not apply for k>2. Even 
stronger, van den Wollenberg showed that there does not exist 
a universal factor to adjust the estimate of €j in order to 
get a consistent estimate. This factor would have to depend 
oa the distribution of the item difficulties and on the 
ability distribution. 

Conditional Maximum Likelihood (CML) 

This method is based cn the fact that in the Rasch model 
a 'sufficient' statistic for the incidental parameter 8 it 
namely the number of correctly answered items by person i. 
exists . 

The concept of sufficiency, as introduced by Fisher 
(15 0). was based on the fact, that some part of the data 
carries no information about the unknown distribution and 
that therefore X can be replaced by some statistic T=T(X) 
without loss of information. Many nice features of 
sufficiency can now be derived; all of these are based on the 
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fact that for making inference one can confine oneself to a 
sufficient statistic. 

Sufficiency as defined by Andersen (1970, 1973) however, 
does not necessarily have the same features. Basicly, 
Andersen's defini ion of sufficiency is an extension of the 
earlier definition of Fisher's concept of sufficiency. Since 
these two definitions of sufficiency are not equivalent, all 
results that are derived from Andersen's new definition 
should be carefully checked. This has been done by Andersen 
(1973) in most cases, only in the case of the principle 
"information- there are some discrepancies. For instance, in 
the Rasch model, if one conditions on the total score of 
person v, one can show that no information about that 
person's ability is lost, but there seem reasons to believe 
that this is not true for the information about the items, 
i.e., by proceeding in this way one discards information 
(Engelen, forthcoming 1988). 

For the special case of the Rasch model, Andersen's 
notion of sufficiency means that the total score of person v 
is a sufficient statistic for the ability 9 of that person in 
the presence of the item parameter c. Note the contrast with 
the ordinary principle of fjfficiency, where the total score 
of person v and the numbex correct on item i are (jointly) 
sufficient statistics for the person parameter 9 and the item 
parameter €. Denote the total score statistic for person v as 
T v . The conditional probability for the score pattern x v , 
given T v =t v can now be derived: 
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(5) P(X v =x v |T v =t v ) = P(X v =x v .T v =t v )/P(T v =t v ) 

Noting that P(T v =t v ) = E P(X v =x v ) and that P(X v =x v .T v =t v ) 

My * tt ty 

■ P(X v «x v ). the probability statement in (4) can be rewritten 
into 



(6) P(X v =x v |T v =t v ) = 



k x vi 

ir €j 
i=l 



v k *vi 

x vi =t vi i=l 



In this form, the like!ihood function (6) contains no item 
parameters anymore, and estimates of item parameters can be 
evaluated by ordinary maximum likelihood. Andersen 
(1970.1973) showed that these estimates are consistent and 
have asymptotically a normal distribution. Starting from (6). 
i.e., regarding (6) as a model on itself, no problems would 
be encountered with maximum 1 ikel ihood est imat ion . for 
example, the item parameters would be estimated correctly. A 
rationale for (6) as a model can. however, not be given. 

An important drawback of the CML estimation procedure 
might be that examinees with all items correct or all items 
wrong, have to be eliminated from the sample, since in that 
case no conditional item estimates can be obtained. That no 
estimates exist for these persons, can be easily seen from 
(6). since in that case, both sides are equal to one. The 
only information that we can draw now is that for examinees 
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with all items correct (wrong), the items were all too easy 
(difficult) . 

The denumerator in (6) is termed elementary symmetric 
function. The evaluation of these functions is a tedious 
task, and was for a long time possible only for a small 
number of Items (Hambleton & Swaminathan. 1985). In a paper 
by Verhelst et all (1984). it is shown that no serious 
problems are encountered anymore, and that one can handle as 
many as 1000 items now. 

After estimates of the item parameters have been 
obtained, one can estimate the person parameters by 
considering these estimates as the true values, substituting 
these values in the likelihood (3). and obtaining maximum 
likelihood estimates of the person parameters in the usual 
way. Since the number of persons is usually large, so that 
the item parameter estimates have a very small standard 
error, the effect of treating estimated values as known, 
seems appropriate. The precise effect of this procedure is 
however not known, yet. The effects of the replacement of 
true item parameter values by estimated vaTues will be 
analyzed with the help of #. simulation study simulation 
(Engelen. forthcoming 1988) 
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Marginal Maximum Likelihood (MML) 

The person parameters are now regarded as independent 
and identically distributed random variables. In other words, 
it is assumed that there exists a distribution function of 
ability F and that persons are exchangable. i.e. . the ability 
of a randomly drawn person is an outcome of this 
distribution. For the Rasch model, we can evaluate the 
probability for a score pattern x, given the population of 
interest, by integrating the probability (2) over the 
population density dF(6): 

m 

(7) P(X=x|F,€) = | P(X*x|z.€)dF(z) 

0 

The integral in (7) is evaluated as a Stieltjes-integral ; if 
there exist a derivative of F. then dF(z) can be replaced by 
f (fc)dz and we hive an ordinary Riemann integral. 

In this marginal likelihood function. no person 
parameters are presert anymore, since they have been 
integrated out . Hence . ( 7 ) is a function of the item 

parameters «j « k and the ability distribution function F 

only. 

Substituting (3) into the marginal probability function 
and rearranging leads to: 
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m 

* z vi f z vi * 

(8) P(X=x|F.€) = w € A z £ /( ¥ <l+Mj))dF(E> 

i=l J i»l 1 

0 



Substituting = J €i Xi . and D x (z.€) = z^/c J (1+zs,)). 

i=l t=i A 

the marginal probability function for the responses of all N 

examinees is given by: 



k 

(9) P«i-xi Xj-xjIF.c) ■ ¥ P(X v =x v |F.fi) 

▼=1 



= ¥ {b, |D x (z.€)dF(z)) MjC . 



x 0 



Jd x ( 



where M x is the number of examinees with response pattern x. 

From this starting point, a few different routes have been 
followed. First, one can assume that F belongs to a special 
parametric family, indexed by a parameter *. Then, one can 

estimate ♦ along with the item parameters e 1 €]c . A common 

choice for F has been a lognormal distribution with mean 
exp(M+Ha 2 ) and variance {exp(a 2 )-l} exp(2u+a 2 ) (so *=( M .a 2 )). 
Recall that a random variable Y is lognormal ly distributed 
with mean exp(»i+Xcr 2 ) and variance {exp(a 2 )-i} exp(2p+a 2 ) if 
log Y is normally distributed with mean p and variance a 2 . 

Good results with this ability distribution where 
obtained by Thissen (1982). Andersen 8c Madsen (1977). Mislevy 



ERIC 



?9 



Different Estimation Procedures 

15 

(1984) and Sanathanan & Blumenthal (1978). Most of these 
authors assumed that the item parameters where known 
beforehand, so that only m and a 1 had to be estimated. 

Note, however, that the form of the ability distribution 
need not to be known beforehand. Hence, this method lacks a 
basic common sense interpretation. 

Therefore, one can try to estimate the ability 
distribution Jointly with the estimation of the item 
parameters. For this purpose, Bock & Aitkin (1981) used a 
discrete distribution over a finite number of points and 
called this histogram the empirical distribution . Although 
they claim that they now freed the marginal maximum 
likelihood procedure from arbitrary assumptions about the 
ability distribution, this is not true. Since they use 
preassigned values for the nodes of the ability distribution 
function, and since these nodes are not changed during the 
iteration process used to estimate the ability distribution 
function mud the item parameters, Bock and Aitkin are 
actually working in the parametric setting again. De Leeuw & 
Verhelst (1986) and Engelen (1987) showed that one can in 
fact estimate the ability distribution function jointly with 
the item parameters. Furthermore, both authors showed that 
th.is can be done consistently, under certain suitable 
regularity conditions. The ability distribution function 
turns out to be a step function, where the number of steps is 
a function of the number of items cnly. All this will be 
discussed in more detai 1 . the third section. 
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Bayesian Estimation 

In the Bayesian framework, one starts with imposing 
reasonable prior distributions for the parameters of 
interest. Reasonable in this context should be understood as 
ease of computation for or believe in the particular prior 
chosen. Then using Bayes rule, one can. hawing observed the 
data, compute the a posteriori distribution. This a 
posteriori distribution now. will be used as the base of 
further inference. 

Bayesian estimation always improves on maximum 
likelihood estimation, if it is reasonable to assume that one 
or more subsets of parameters can be considered as 
exchangeable members of corresponding populations. If no 
prior information is available, and one uses a non- 
informative prior for a parameter, i.e.. a flat (uniform) 
one. than Bayesian estimation is equivalent to maximum 
likelihood estimation. 

Historically. Bayesian estimation started with the 
specification of a parametric prior distribution. Later on. 
this changed into the specification of empirical priors, 
i.e.. priors that are estimated from the data, and 
hierarchical Bayesian estimation, where a prior is specified 
for the parameters in the prior distribution. The latter has 
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one clear advantage: hierarchical Bayes is far more flexible 

than ordinary Bayes estimation. 

In principle, one can distinguish three different 

Bayeslan estimation procedures in the Rasch model: (i) both 

item and person parameters are subject to prior information; 

i.«.. prior distributions for item as well as person 
parameters are assumed; <ii) only a prior distribution for 

the person parameter is specified; and <iii) only a prior 
distribution function for the item paramc.rs is specified. 

Procedure (iii) has never been used in the Rasch model 
before, since in most applications the item parameters are 
known beforehand or are believed to be estimated reasonable 
by one of the maximum likelihood procedures. This restricts 
the discussion to the first two procedures. 

First, we will discuss the first procedure, since the 
second one can be seen as a special case. We will do this for 
hierarchical Bayes estimation. The starting point for the 
analysis is likelihood function <3). Using Bayes rule, the 
posterior distribution f of the observed data and all the 
parameters is proportional to the product of this likelihood 
and the prior distribution g of the parameters: 

UO) f<X.9.€) o L<X|9.€)g<9.€). 



How one has to chose a prior distribution for the item and 
person parameters. Swaminathan and Gifford (1982) show that 
the analysis can be effectively reduced if one makes the 
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reasonable assumption that the item and person parameters are 
independently distributed. They also assume that the 
distributions for item and person parameters have the same 
form (both distributions multivariate lognormal). a standard 
approach, but nevertheless not free of criticism. So. we have 

(11) log 6 V - N( Me .# e ) ; log €i - N(u € .# € ). 

To complete the hierarchical Bayes structure. prior 
distributions for the so-called hyperparameters M e .#e.u € .t € 
have to be specified. For the means u e and m € . a flat uniform 
prior is chosen, and since Me and # c are variances, inverse 
X 2 distributions with parameters t and 0 seem appropriate. 
Note that these are conjugate priors. Finally. Swaminathan 
and Gifford shoved that reasonable values for the 
hyperparameters are between 5 and 15 for t and about 10 for 
8. Working all this out. they find the likelihood of the 
posterior distribution, which they use as a base for further 
inference. For more specific details, see Swaminathan and 
Gifford (1982). 

Note that the classical objection against Bayes ian 
procedures applies in this case also: no empirical evidence 
for the choice of the priors is given. On the other hand, 
considering the flexibility of hierarchical Bayes estimation, 
this need not to be a serious problem. 

An other approach is given by Mislevy (1986). who uses 
the same structure for the item parameters, but changes the 
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prior of the person parameters. For the person parameters. 
Mislevy offers a choice between a nonparametric prior in the 
form of a histogram and a mixture of normal components. 
Again, natural conjugate priors are chosen for the 
hyperparameters . Note that the term nonparametric is 
misplaced; the nodes of the histogram are fixed in advance 
and are not estimated from the data. See also Engelen (1987) 
for a discussion of this in the marginal model. 

The results of Mislevy (1986) and Swaminathan and 
Gifford (1982. 1986) show that hierarchical Bayesian 
estimation yields good results. This is especially true for 
the case of the three parameter logistic item response 
models, where maximum likelihood estimation performs rather 
bcdly. even for a very large number of examinees. 

Minimum Chi-Square Estimation 

Another estimation procedure has been proposed by 
Fischer and Scheiblechner (1970) and Fischer (1974): the 
minimum chi— square estimation. This procedure starts with the 
observation that 

(12) n i3 /n 3i * €i/€j. 
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where n Li stands for the number of examinees that respond 
correctly to Item 1 but Incorrectly to item j. with the 
easier notation 

(13) * 6 A - l/€i. 
Fischer uses 

(n l1 6 l- n l1 & l> 2 

(14) E J x XJ 1 

l<j ^(nij+nji) 

as a chl-square criterion. Now. (15) Is minimized with 
respect to the item parameters 6 L . which yields estimates of 
these parameters. Subsequently, the person parameters can now 
be estimated. In the Berne way as with conditional maximum 
likelihood, by using the estimated values of the item 
parameters as the true ones, and maximizing the resulting 
likelihood expression. 

An advantage of this method i s its fastness. 
Furthermore, although the n 4 j are dependent. Fischer and 
Scheiblechner (1970) claim, as a result of their simulation 
studies, that the distribution of (15) is approximately 
distributed as chi square. That this is true in the general 
case, has however never be shown, neither has the contrary. 
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Paired Comparison Estimation 



In this method, the Kasch model is rewritten as a model 
for paired comparison with ties (Bradley, 1976) . In the latter 
method one compares the responses of a subject responding to 
a pair of items. Therefore, the Rasch model is rewritten in 
the following way: 

(15) P<X V i sat vi|9v.*i> * MM*!)" 1 - 

This is done by substituting c^c^and a simple rewriting of 
(1). For a pair of items (i,J) f one can now consider the four 
possible patterns of an examinee v. These patterns then, give 
inform*. 'on about the relative difficulties of the two items 
for that examinee. In other words, one considers these 
patterns as the outcomes of a paired comparison experiment. 
In that case, three basic different outcomes can be 
distinguished: 

{X vi > X v j) item i correct and J not 

{X V i < X v j) item J correct and i not 

(X v i = X v j} both items correct or both items 
incorrect. 



The first outcome can now be interpreted as a comparison 
showing that, for examinee v, item i is likely to be more 



Difforont Estimation Procedures 

22 

•««y than item J. Tha othar outcomes are interpreted 
analogously. 

The probabilities for the possible outcomes can now be 
evaluated for the Easch model : 



(16) p(i vi < x vj ) « Miicey+Sixe^))- 1 

P(X vi - X V j) - (e v 2 +6 i 6j)[(e v +6 i )(9 v +6j)]-l. 

If one now conditions on the event on a non-tie, or 
•quivalently on the event that the total test score for the 
two items is cue. the result is the Bradley-Terry model from 
the paired comparison literature: 

(17) Pd^i > X vJ . x vi = X vJ ) * 6j(6 i +6 J )-l « Tij . 

Note that in aquation (12) the person parameter 0 T has 
disappeared; for any examinees the probability described in 
(12) is independent of that examinees ability. This means 
that the likelihood for a comparison of two items takes the 
form 



(18) 




where a^ is the number of times <X vi > X vJ ) is observed. For 
n items however, the outcomes of the comparisons U vi > X v j) 
and <X vk > X vi ) are not independent. It is shown by van der 
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linden and Eggen (forthcoming, September 1987). that the 
number of independent comparisons for an examinee with total 
test score t on n items, is given by min{t.n-t}. Denoting the 
set of independent comparisons for a set of n items by J, the 
likelihood r this set is given by 

(19) L(*i *nl a ij.> = t Sjtti+Sjr^Js^Si+SO-^ 1 . 

In iterative algorithm for obtaining maximum likelihood 
estimates has already been given in the general paired 
comparisons setting Independently by Zermelo (1929) and Ford 
(1957). Furthermore, they shoved that these maximum 
likelihood estimates exist and are unique if the following 
necessary and sufficient condition is satisfied: For every 
partition of the set in two non-empty subsets, for some item 
in the first set and some item in the second one, the outcome 
{X vi >X v j) has occurred at least for one value of v. That this 
a weak, almost always satisfied condition has been showed 
by Fischer (1981), who found the same condition for the 
existence and uniqueness of conditional maximum likelihood 
estimates. In contrast with conditional maximum likelihood 
estimation, this method is not limited to a small number of 
items, since elementary symmetric functions of order greater 
than two do not have to be calculated in this approach. Note 
that since the item parameters are estimated by maximum 
likelihood, tiiey are estimated consistently. 
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After item parameters have been obtained, the person 
parameters can be obtained by maximum likelihood estimation 
where the real values of the item parameters are replaced by 
their estimates. 



More on the Marginal Rasch Model 

In this section we will compare the different estimation 
procedures in greater detail and mention some other features 
of the marginal Rasch model that have not been discussed 
before . 

First, we have to explain why one wants to use the marginal 
Rasch model instead of the Rasch model itself. Is explained 
before, one can not use the Rasch model in combination with 
Joint maximum likelihood estimation, since the resulting item 
parameter estimates are not consistent. Remains the 
possibility of the conditional model. The main reason not to 
use the conditional model is the loss of information 
(mentioned earlier). 

Secondly, the marginal Rasch m«. >I uan be seen as a model on 
itself, just like the unconditional Rasch model, and was 
introduced as such by Cressie and Holland (1985). In doing 
so. Cressie and Holland used the notion of manifest 
probabilities (Lasersfeld & Henry. 1968). i.e.. the 
proportion of examinees in a certain given population who 
obtain a particular pattern of right and wrong responses. 
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These manifest probabilities can be explained by an - 
unobservable- latent trait model if that model correctly 
predicts thw data. Then they show that the Kasch model is a 
modal that can predict these manifest probabilities 
correctly, given that the data satisfies certain conditions. 
These conditions will be discussed later. Note that there is 
no rational explanation for the conditional Kasch model. 

Furthermore, it is not clear at all which of the 
properties derived for the conditional maximum likelihood 
astimation by Andersen (1970. 1973). are really true. How 
wall do the two different conceptions of sufficiency as given 
by Fisher and Andersen match ? Another reason is that the 
conditional model is only applicable to the one-parameter 
logistic model and not with the two- or three- parameter 
logistic modals. The reason for the latter is that in the 
more parameter logistic models, no simple • sufficient* 
statistics for ability exist. 

Next, we shall discuss some advantages and disadvantages 
of warginal maximum likelihood estimation in the Kasch model 
in comparison with the other maximum likelihood and the 
minimum chi-square and pairwise comparison approaches. First, 
no persons have to be eliminated from the data to be able to 
obtain estimates for the item parameters. In the other 
maximum likelihood approaches, persons with all items correct 
or wrong as wall as items thnt have boen answered correctly 
by all examinees have to be eliminated from the initial data- 
set. Also, in the minimum chi-square and in the pairwise 
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comparison approach, one can r . use the complete dataset for 
estimation. This has the effect that not all the available 
information in the data-set is used. Secondly, marginal 
maximum likelihood estimation is also applicable in the two- 
and three- parameter logistic models, while the others are 
not. Unconditional maximum likelihood estimation does not 
work in more parameter logistic models, since the estimates 
of the guessing parameter drift out of their bands (Mislevy. 
1986). The unconditional maximum likelihood, the minimum chl 
square and the palrwise comparison approaches do not apply in 
more parameter logistic item response models, since the 
notion of sufficiency, which is the uniform base for all 
these estimation procedures, is violated in these models 
(Fischer. 1974). The main disadvantage of the marginal 
approach is that no estimates of the person parameters are 
obtained, only information about the distribution of ability 
is achieved. Nc;;e that the main purpose of a test is often to 
get information about the ability of the examinees taking the 
test. With marginal maximum likelihood estimation, this 
information is not available; only the ability distribution 
function can be estimated. However, this ability distribution 
estimate could be used, for example, as an instrument to 
measure the differences between different schools or 
different curricula. 

Important to note is further that marginal maximum 
likelihood estimation yields consistent item parameter 
estimates and that together with the estimated distribution 
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fuaction of ability, one achieves a reasonable fit in most 
cases . 

Compared with modal Bayes estimation . the marginal 
maximum likelihood approach yields the same results and is 
hence equivalent. This subject to the constraint that no 
prior distribution is put on the item parameters. 

The conditions on the manifest probabilities as given by 
Cressie and Hollwid (1983). are exactly the same as the 
conditions that de Leeuw and Verhelst needed to be able to 
estimate the (empirical) ability distribution function. The 
conditions of Engelen (1987) only show that is possible to 
estimate 'the ability distribution consistently: it is not 
proven that these estimates exist. To be able to do this, one 
needs additional constraints like the ones given in de Leeuw 
and Verhelst. 

It is Important to stress the fact that one should use 
empirical marginal maximum likelihood, or equivalently. 
empirical Bayes estimation, instead of the parametric 
approach. This is necessary since one never has an exact 
indication of the true form of the ability distribution 
function ; therefore this function should be estimated 
empirically. 

Although marginal maximum likelihood estimation seems 
the most appropriate one. serious numerical problems exist, 
especially for distribution free marginal maximum likelihood 
estimation . 
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